Multi-class composite N-gram based on connection direction
نویسندگان
چکیده
A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.
منابع مشابه
Multi-Class Composite N-gram Language Model for Spoken Language Processing Using Multiple Word Clusters
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect training data. The Multi-Class Composite N-gram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, called MultiClasses. In the Multi-Cl...
متن کاملMulti-class composite n-gram language model using multiple word clusters and word successions
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem in small amount of training data. The Multi-Class Composite Ngram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, so-called Multi-Classes. In the Multi-Class, the statistical connectivity...
متن کاملNew language models using phrase structures extracted from parse trees
This paper proposes a new speech recognition scheme using three linguistic constraints. Multi-class composite bigram models [1] are used in the first and second passes to reflect word-neighboring characteristics as an extension of conventional word n-gram models. Trigram models with constituent boundary markers and word pattern models are both used in the third pass to utilize phrasal constrain...
متن کاملA class of multi-agent discrete hybrid non linearizable systems: Optimal controller design based on quasi-Newton algorithm for a class of sign-undefinite hessian cost functions
In the present paper, a class of hybrid, nonlinear and non linearizable dynamic systems is considered. The noted dynamic system is generalized to a multi-agent configuration. The interaction of agents is presented based on graph theory and finally, an interaction tensor defines the multi-agent system in leader-follower consensus in order to design a desirable controller for the noted system. A...
متن کاملMicroleakage comparison of three types of adhesive systems versus GIC-based adhesive in class V composite restorations
Background and aims: New dentin bonding agents and techniques have been developed to reduce microleakage and create higher bond strength. This in-vitro study compared the microleakage of three resin-based adhesives versus a GIC-based adhesive on class V composite restorations. Materials and Methods: Class V cavities were prepared on the buccal surfaces of 72 sound premolars, randomly assigned ...
متن کامل